WBC-AMNet: Automatic classification of WBC images using deep feature fusion network based on focalized attention mechanism

The recognition and classification of White Blood Cell (WBC) play a remarkable role in blood-related diseases (i.e., leukemia, infections) diagnosis. For the highly similar morphology of different WBC subtypes, it is too confused to classify the WBC effectively and accurately for visual observation of blood cell smears. This paper proposes a Deep Convolutional Neural Network (DCNN) with feature fusion strategies, named WBC-AMNet, for automatically classifying WBC subtypes based on focalized attention mechanism. To obtain more localized attention of CNN, the fusion features of the first and the last convolutional layer are extracted by focalized attention mechanism combining Squeeze-and-Excitation (SE) and Gather-Excite (GE) modules. The new method performs successfully in classifying monocytes, neutrophils, lymphocytes, and eosinophils on the complex background with an overall accuracy of 95.66%, better than that of general CNNs. The multi-classification accuracy of WBC-AMNet with the background segmentation is over 98% in all cases. In addition, Gradient-weighted Class Activation Mapping (Grad-CAM) is employed to visualize the attention heatmaps of different feature maps.


Introduction
The analysis of White Blood Cell (WBC) images can assist clinical medicine experts in diagnosing many blood-related disorders such as leukopenia, Acute Leukemia (AL), agranulocytosis, etc. Importantly, AL is a malignant clonal disease of hematopoietic stem cells. Without special therapy, the average survival period is about three months, and even some patients died within a few days of diagnosis. AL is commonly classified into Acute Lymphoblastic Leukemia (ALL) and Acute Myelogenous Leukemia (AML) [1]. The survival rate of AML within five years is 40% [2], and in five cases in Europe, the annual survival rate of the disease is only 19% [3]. Therefore, the automated detection and classification of WBC sample images are of considerable reference value for leukemia diagnosis.
However, the presence of dyeing impurities and cytoplasm with low image contrast makes the microscopical differences between WBC more challenging to distinguish [4,5]. Several schematical images of two datasets with or without background segmentation are shown in Fig 1.

Methods
The pipeline of our approach for classifying the WBC images are described following. In the first, the WBC images of the online source are taken from blood smears under microscopes and labeled by experts.
And then, the images are pre-processed: all images are resized to 224 × 224 pixels to fit the model; the random rotation (by an angle of -14 * 15 degree), cropping, and flipping are used to eliminate the effects of irrelevant information and noise of images; and color distorting is conducted to make images clear. By color distorting, the brightness, contrast, saturation, and chromaticity of the image are adjusted by random factors taking values in (0, 1).

PLOS ONE
After that, the images are input to the proposed WBC-AMNet with a part of parameters pre-trained on the ImageNet dataset to train and fine-tune the model. The WBC-AMNet is implemented the proposed focalized attention mechanism, and Grad-CAM is conducted to visualize the attention.
Finally, the classification results of WBC images are obtained and assist in diagnosing.

WBC-AMNet model
Our WBC-AMNet model is based on DCNN architecture with focalized attention mechanism. DCNN architecture mainly refers the WBCsNet [13] and uses the group convolution strategy.
Group convolution strategy. The group convolution strategy was employed in ResNeXt, while adopting the idea of VGG stacking and split-transform-merge of Inception [18]. Group convolution improves the accuracy while reducing the number of hyperparameters, sketch map as Fig 2b. Group convolution can not only significantly reduce the amount of model calculations [19], but also improve the accuracy of WBC-AMNet.
The focalized attention mechanism is mainly realized by Squeeze-and-Excitation (SE) module and Gather-Excite (GE) module.
Squeeze-and-Excitation (SE) module is a computational module constructed based on the CAM [20]. The core idea of the SE module is to model the interdependencies between feature channels, namely, use the global average pooling to squeeze the WBC feature map, perform a nonlinear transformation by excitation, finally superimpose on the input features, and recalibrate the feature channels by adaptive learning. The structure of the SE module is shown in Fig 2c. Let U = [u 1 , u 2 , . . ., u C ] 2 R H×W×C be the input, and K = [k 1 , k 2 , . . ., k c ] denote the kernel set of filters in learning, k i denotes the parameter of the i-th filter ðk c ¼ ½k 1 c ; k 2 c ; . . . ; k C c �Þ. After a series of transformations, we get F = [f 1 , f 2 , . . ., f c ], then (Eq 1) [21]: where � denotes convolution. Thus, we can obtain the result F after SE module (Eq 2): Based on the SE module, the Gather-Excite (GE) further exploits the feature context in the CNN by introducing a pair of operators ξG (step-wise deep convolution) and ξE [22]. The core of the GE module is to use different filters layer by layer on the feature map, which makes WBC-AMNet aggregate the features extracted by WBC accordingly. Gather can effectively aggregate feature responses on a large spatial scale, and then excite is used to redistribute the aggregated information to local features. The structure of the GE module is displayed in Fig  2d. After processing by GE module, WBC-AMNet focuses on local features more precisely and improves the feature extraction ability greatly. The fused features of the first and last convolutional layers in the model are input to the SE module, while the feature maps of the last convolutional layer are input to the GE module. Then, the features of the SE module and GE module are fused to obtain the attentional features, and the attentional features are finally fused with the original features of the last convolutional layer. The output of the model is depended on the type of WBC in the dataset. The method of fine-tuning is implemented to obtain the optimal parameters of WBC-AMNet by transfer learning and gradient learning rate strategy. Fig 2a depicts the focalized attention mechanism with DCNN architecture for WBC image classification, where the ReLU activation function is ReLU(x) = max(0, ω T x + b). � is an element-wise multiplication operator, namely, the input X and the input Y are multiplied element-by-element, and the output elements at each position are stored in the returned result, Out = X � Y. And � is an element-wise add operator, namely, the input X and the input Y are added element-by-element, and the output elements at each position are saved in the returned result, Out = X � Y.
The idea of focalized attention mechanism guides the WBC-AMNet construction base on the backbone of SE-ResNeXt implemented GE module and group convolution strategy ( Table 2). SE-ResNeXt. This module is used directly with residual networks of ResNeXt model to build SE-ResNeXt [20]. The innovation of the attention mechanism significantly improves the performance of the ResNeXt model with no additional calculative cost.

Attention visualization
WBC-AMNet only outputs numerical results such as accuracy, but it is difficult to intuitively understand the essential features and locations that the model finally extracts. To explain the effect of focalized attention mechanism in more vivid detail, we visualize the feature extraction and attention heatmap of WBC-AMNet using the Grad-CAM method [23]. Assume that the penultimate layer produces m features maps F m (F m 2 R H×W for any C) and F m ij is the activation of F m at location (i, j). Grad-CAM obtains the gradient information of the score g c for class C and uses the average value of all gradients as the weight of the feature map. After weighting the extracted features, the ReLU operation finally highlights the crucial regions in the WBC images through the class-discriminative localization map Grad-CAM L c GradÀ CAM (Eq 4). Grad-CAM does not require retraining of the proposed model, and it visualizes the local position in the WBC image which allows WBC-AMNet make the final decision.

Evaluation
This paper analyzes the performance of classification using indexes including Accuracy (Eq 5), Specificity (Eq 6), Precision (Eq 7), and F1-score (Eq 8). They are calculated as follows: Among them, True Positive (TP) indicates accurately identified positive data labels; False Positive (FP) indicates incorrectly identified positive data labels; True Negative (TN) indicates correctly identified negative data labels; False Negative (FN) indicates incorrectly identified negative data labels.

Tri-classification results of BCCD
We perform a tri-classification analysis of WBC images from BCCD. Due to the presence of combined immune receptors based on T cell receptors in addition to T cells in neutrophils and monocytes, and they are derived from granulocyte monocyte progenitor cell [24,35]. So we take monocytes and neutrophils cells as a set named MTD. The results of the search for epoch, batch size parameters, and the corresponding evaluation indexes are given in Table 3.
Detailed data for WBC subtypes with different parameters are presented in S1 Table in S1 File. Fig 3 shows the change of objective function value in the training processing. As the number of iterations increases, the training accuracy improves rapidly in the initial stage, and then it converges to 1.00 gradually. Our model reaches optimal performance when epoch = 20 and batch size = 32, at which point the accuracy reached 95.66%. Under the optimal parameters, we analyze in detail the recognition and classification ability of WBC-AMNet for three WBC subtypes. In Table 4, the   Fig 4b. Both types of cells are easily confused with each other, whereas lymphocytes do not appear to be misclassified at all. The problem of misidentification between these two subtypes of WBC is also a common challenge for existing classifiers [11,25].
The accuracy of MobileNet-V1, MobileNet-V2, ResNet, DistResNet, and SE-ResNeXt are all over 93%, but the accuracy of WBC-AMNet is still significantly improved. The accuracy of WBC-AMNet is nearly two times higher than that of VGG, and other evaluation metrics also have significant differences. The simple structure of VGG makes it less practical for WBC  VGG (Fig 5a) is a conventional CNN model, which identifies all WBC subtypes as MTD. Compared with MobilNetV2 (Fig 5b), WBC-AMNet improves the problem of misclassifying eosinophils as MTD. It reduces the number of eosinophils misclassified by nearly one third.

PLOS ONE
ResNet (Fig 5c) addresses the problem of misclassifying MTD as eosinophils to a certain extent. The introduction of the focalized attention mechanism allowed WBC-AMNet to target attention to valuable features. SE-ResNeXt (Fig 5d) improves the problem of misclassifying MTD as eosinophils and shows unexpected results in predicting MTD. The combination of focalized attention mechanism and feature fusion allows WBC-AMNet to obtain local attention moreover.

Quad-classification results of BCCD
Monocytes and eosinophils are essential references for diagnosing diseases such as monocytic leukemia and an underlying allergic state, respectively [33,34]. Accordingly, quad-classification is performed using WBC-AMNet for eosinophils, monocytes, lymphocytes, and neutrophils, with approximately 2480 training images and 620 test images for each subtype of WBC. Based on the results of the tri-classification parameter search, we refer to its optimal parameters (epoch = 20, batch size = 32), and the statistical results of different WBC subtypes are shown in Table 6. Due to a small proportion of monocytes and eosinophils are misclassified, resulting in their slightly lower accuracy. Detailed data for WBC subtypes with different parameters are presented in S3 Table in S1 File. Comparing the results in Table 4, we find that lymphocytes still maintain a high classification accuracy. However, after reclassifying the MTD, the accuracy of eosinophils decreased, and the accuracy of neutrophils and monocytes is also not high. We speculate that a misclassification problem occurred [35]. Although the accuracy of neutrophils is high, the predicted result is not as well as it should have, resulting in a low F1-score. Conversely, although the accuracy of monocytes is low, it is incredibly predictive. The above phenomenon indicates that WBC-AMNet has a good classification ability for neutrophils, but the precision is higher for monocytes. Fig 6a shows that the classification results are not satisfactory except for lymphocytes, reflected in the confusion matrix in Fig 6b. The quad-classification method identified monocytes and eosinophils as neutrophils several times, verifying our speculation. Other identification of WBCs as neutrophils is more numerous, but there are no cases of neutrophils identified as monocytes. Compared with monocytes, WBC-AMNet extracts the features of neutrophils more accurately.
Based on the 11 CNN models, the results and statistical data are shown in Table 7. Compared to Table 5, the classification ability of VGG is significantly improved on the quad-classification problems. The new model with the introduction of focalized attention mechanism has a significant improvement in accuracy compared to ResNet. The operation of feature fusion makes the classification accuracy of WBC-AMNet better than that of the best model SE-Res-NeXt nowadays. Accuracy and other data intuitively reflect the important guiding significance of feature fusion for the model to extract features and process them.

Classification results of WBCs dataset
We use our method to classify the WBC images from WBCs dataset. These WBC images are all without complex background. In this section, we compare our method with 3 representative methods: From Tables 5 and 7, it can be found that MobileNetV2 has a higher accuracy rate. Comparing ResNet and SE-ResNeXt with WBC-AMNet, respectively, we will get the effect of introducing attention mechanism and GE module. On the premise of the same parameters (epoch = 20, batch size = 32), we choose MobileNetV2, ResNet and SE-ResNeXt to train and compare them with WBC-AMNet. Tri-classification of WBCs dataset. First, the tri-classification results of different WBC subtypes are analyzed in Table 8. Slightly different from the BCCD, WBC-AMNet has a higher classification accuracy for neutrophils in the WBCs dataset, regardless of the model. Except for the intermediate cell, the classification accuracy of WBC-AMNet is above 99%, which is a satisfactory result. However, the accuracy of the intermediate cell is slightly lower, and we will reclassify it in the next section to further explore the reason. As shown in Fig 7, our model has the best performance among 4 methods in the view of 4 indexes: Accuracy, Specificity, Precision and F1-score.
The accuracy of WBC-AMNet combined with the focalized attention mechanism is nearly 4% higher than that of ResNet. WBC-AMNet also achieves more than 1% higher accuracy than SE-ResNeXt, not only due to the introduction of GE module but also thanks to the

PLOS ONE
operation of feature fusion. Regardless of the model, the classification accuracy of intermediate cells is lower than that of other cells. But for lymphocytes, the accuracy of the proposed model has increased, which is the main reason for the increase in total accuracy. The AUCs of lymphocytes and neutrophils in Fig 8 are both 1.00, and the AUC of intermediate cells and the whole is 0.99. From Fig 8b, we can observe that a tiny number of intermediate cells (i.e. MTD) are still misclassified as lymphocytes, which is different from the conclusion of the BCCD. We suspect believed to be caused by different problems in different contexts misclassification. The confusion matrices of the four models depict in S5 Fig in S1 File.
In Fig 9, the solid lines of different colors represent the ROC curves of different WBC subtypes, and the blue dashed line represents the overall macroscopic average ROC curves. The AUC of MobileNetV2 (Fig 9a) is 0.98, the AUC of ResNet and SE-ResNeXt is 0.99. WBC-AM-Net improves the classification of all three WBC subtypes. From Fig 9a and 9b, the light blue curve is slightly lower, which intuitively shows that the classification capabilities of Quad-classification of WBCs dataset. The classification results and statistics of our method and three compared methods for the four WBC subtypes are listed in Table 9. The classification rate of WBC-AMNet in eosinophils becomes almost twice as high as that of MobileNetV2 and ResNet. Since both triple and quadruple classifications are improved, we conclude that WBC-AMNet gets the best performance among four methods.
MobileNetV2 has serious misclassification problems when recognizing eosinophils and monocytes, resulting in a low accuracy of these two subtypes of WBC. Except for eosinophil, ResNet has a higher accuracy for other WBC subtypes. However, since the number of images of eosinophils in the WBCs dataset is small, it has little effect on the overall accuracy. With the SE module, the accuracy of SE-ResNeXt has been significantly improved. Especially for eosinophils, SE-ResNeXt is about 35% higher than ResNet. Such a large increase in accuracy verifies the importance and effectiveness of using the attention mechanism strategy. The classification

PLOS ONE
accuracy of monocytes is still not satisfactory. By integrating the GE module, the recognition accuracy of WBC-AMNet in monocyte is improved by nearly 10% compared with SE-Res-NeXt. Moreover, WBC-AMNet has an accuracy rate of over 95% for each WBC subtype. So far, we can come to the conclusion: WBC-AMNet has achieved effective WBC classification on WBCs dataset. Fig 10 shows that for quad-classification, our model, comparing to other 3 methods, still gets the higher scores of Accuracy, Specificity, Precision and F1-score.  In Fig 12, MobileNetV2 has an AUC of 0.98, ResNet has an AUC of 0.99, SE-ResNeXt and WBC-AMNet has an AUC of 1.00. The closer the AUC is to 1.00, the better the performance of the model. It can be seen from AUC that the performance of the model is improved after introducing the attention mechanism. Comparing Fig 12 with Fig 11a, the AUCs of Mobile-NetV2 and ResNet are lower on eosinophils and monocytes, the ROC of SE-ResNeXt on monocytes is slightly lower, and WBC-AMNet has reached 1.00 on all WBC subtypes, which means that WBC-AMNet is significantly improved compared to other CNN models.

Visualization analysis
The attention of different feature maps of WBC-AMNet is visualized in the background of single-cell segmentation. Firstly, the heatmap is obtained by a regular convolution operation. The first heatmap has highlighted regions spread over almost the whole WBC image and is very distracting. In order to focus the attention, the strategies of focalized attention mechanism and feature fusion are further introduced. The first and last convolutional layers in the model are

PLOS ONE
feature fused and fed into the SE module. At this point, the area of the highlighted region in the heatmap is significantly reduced, and the red part starts to accumulate in the cell nucleus. Then, the feature map of the last convolutional layer is input to the GE module and fused with the features of the SE and GE modules to obtain the attentional features. The attentional aggregation is slightly reduced, and almost all of them are on the WBC nuclei. Finally, the attention features are fused with the original features of the last convolutional layer. The final heat map obtained reflects the superiority of the WBC-AMNet model. By implementing focalized attention mechanism and deep feature fusion, attention is highly focused on vital and partial locations of the WBC nuclei. Our proposed method extracts the effective critical information in the WBC cell nuclei and avoids the influence of too much redundant and invalid information on the results (Fig 13).

Conclusion
In this paper, we propose a new DCNN, WBC-AMNet, for automatic classification of WBC images based on focalized attention mechanism and deep feature fusion strategy. The attention of different feature maps of WBC-AMNet is visualized using the Grad-CAM method, which extracts the critical practical information from the WBC cell nuclei and avoids the influence of too much redundant and invalid information on the results. Experimental results show that WBC-AMNet gets the better performance than that of several existing models. Although the classification effect of our model is satisfactory, the mathematical mechanism of network architecture is still unclear. In the future, we intend to study the deep learning network from the perspective of mathematics and test more medical image data using our model.